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Abstract 

The dictionary-aided sparse regression (SR) approach has recently emerged as a promising 
alternative to hyperspectral unmixing (HU) in remote sensing. By using an available spectral 
library as a dictionary, the SR approach identifies the underlying materials in a given hyper¬ 
spectral image by selecting a small subset of spectral samples in the dictionary to represent 
the whole image. A drawback with the current SR developments is that an actual spectral 
signature in the scene is often assumed to have zero mismatch with its corresponding dictio¬ 
nary sample, and such an assumption is considered too ideal in practice. In this paper, we 
tackle the spectral signature mismatch problem by proposing a dictionary-adjusted nonconvex 
sparsity-encouraging regression (DANSER) framework. The main idea is to incorporate dic¬ 
tionary correcting variables in an SR formulation. A simple and low per-iteration complexity 
algorithm is tailor-designed for practical realization of DANSER. Using the same dictionary 
correcting idea, we also propose a robust subspace solution for dictionary pruning. Extensive 
simulations and real-data experiments show that the proposed method is effective in mitigating 
the undesirable spectral signature mismatch effects. 

‘Part of this work was published in WHISPERS 2014 [I]. 
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1 Introduction 


Hyperspectral unmixing (HU) aims at decomposing pixels of an hyper spectral image (HSI) into 
constituent spectra that represent some pure materials. HU is useful in a number of applications, 
such as environment surveillance, agriculture, mine detection, and food and medicine analytics. 
As one of the core developments in signal and image processing for HSIs, various HU algorithms 
have been developed in the past two decades from different perspectives, snch as Bayesian inference, 
nonnegative matrix factorization, convex analysis, pure pixel pursuit, and many more; see, e.g., [2113] 
for some overviews. 

Recently, a class of HU algorithms based on spectral libraries has attracted much attention. 
A spectral library is a collection of spectral signatures of materials acquired in controlled or ideal 
environments, e.g., in laboratories. There are several publicly available libraries, provided by 
government agencies and research institutes. For example, the U.S. Geological Survey (U.S.G.S.) 
library [3] contains remotely sensed and extracted spectral signatures of over 1300 materials. Such 
rich knowledge of materials’ spectra in the existing libraries provides new opportunities for HU. By 
using an existing library as a dictionary, and by assuming the linear mixture model, we can treat HU 
as a problem of selecting a small number of spectra from the dictionary to represent all the pixels. 
Such a dictionary-aided semiblind formulation is fundamentally identical to the well-known basis 
selection or sparse regression problem in compressive sensing (CS), and thus many well-developed 
tools from CS can be applied. Fundamentally, there are several advantages with dictionary-aided 
semiblind HU. First, unlike many blind HU approaches (which do not use dictionaries), dictionary- 
aided methods do not require assumptions such as the pure pixel assumption and the sum-to-one 
abundance conditions. Second, dictionary-aided methods may not require knowledge of the number 
of materials contained in the HSIs. 

Several dictionary-aided HU algorithms based on sparse regression were proposed in [SHE]. The 
algorithms in ISIEI and [7] treat the HU problem as a single pixel-based sparse regression (SR) 
problem and a multiple pixel-based collaborative sparse regression (GSR) problem, respectively. 
Classic norm and ^ 2/^1 mixed-norm minimization-based sparse optimization methods are em¬ 
ployed to tackle the formulated problems there. The corresponding optimization problems are 
convex, and thus can be solved efficiently, e.g., by some specialized alternating direction method of 
multipliers (ADMM)-based algorithms. Three main difficulties have been observed when applying 
the algorithms in however: First, the spectral library members (i.e., the recorded material 

spectra) exhibit very high mutual coherence. As is known in CS [91412]. high mutual coherence 
may lead to poor performance when applying ii norm and ^ 2/^1 mixed-norm minimization-based 
sparse optimization. Second, the size of a spectral library is often very large. Consequently, we are 
faced with a large-scale problem, for which computational efficiency becomes an issue. Third, there 
may be mismatches between the actual spectral signatures in the scene and the dictionary samples, 
due to various reasons. Such dictionary mismatches affect the performance of a dictionary-aided 
semiblind HU to an extent which depends on the severity of the mismatches. 

The hrst two difficulties mentioned above have been tackled by employing a dictionary pruning 
method based on multiple signal classification (MUSIC) [8]. MUSIC is a classical subspace method 
in sensor array processing m, and recently finds its application in CS m- In dictionary-aided 
semiblind HU, MUSIC proves to be useful in pre-selecting some relevant spectra from a large 
spectral library. As a result, a size-reduced dictionary can be constructed for the SR and CSR 
algorithms to perform semiblind HU. After dictionary pruning, both the mutual coherence of the 
dictionary and the complexity of the subsequent semiblind HU algorithm can be reduced. 
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However, the third difficulty, spectral signature mismatches, is still not addressed. In practice, 
the mismatch problem arises for several reasons. First, the materials’ spectra may vary from time 
to time, and from site to site, subject to diverse physical conditions, e.g. strength of sunlight and 
temperature |15] . Second, the calibration procedure for spectral signatures may introduce errors. 
Third, the spatial resolutions of spectra in the dictionary can be different from those of the image, 
and that can also result in modeling errors. Spectral mismatches can be rather damaging to the 
existing semiblind HU algorithms; particularly, MUSIC-based dictionary pruning is sensitive to 
spectral signature mismatches, as will be seen in the simulations. 

Contributions In this work, we propose a dictionary-aided HU framework that takes spectral 
signature mismatches into consideration. Our first contribution lies in developing a new dictionary- 
aided HU algorithm. The formulation leading to the new algorithm uses insights of CSR, but 
has two key differences: 1) We model spectral signature mismatches as bounded error vectors, 
and attempt to compensate those errors in the formulation. 2) We employ the nonconvex ^ 2 /^p 
(0 < p < I) quasi-norm as the sparsity-promoting function, instead of the convex ^ 2/^1 mixed 
norm as in CSR [7]. The second endeavor is motivated by the fact that quasi-norm based sparse 
optimization has been demonstrated to exhibit better sparsity promoting performance in certain 
difficult situations, e.g., the high-coherence dictionary case [161118] . Since our formulation considers 
dictionary adjustment, it is more complicated to handle than the previous CSR work. We derive 
the new algorithm by careful design of alternating optimization, and its upshot is that the solution 
update at each iteration involves simple matrix operations. 

The second contribution is a spectral mismatch-robust solution to dictionary pruning. We give 
a robust MUSIC formulation, wherein the goal is to identify spectral signature samples that are 
close to the true materials’ signatures, rather than being exactly equal. At first look, the robust 
MUSIC method seems to be computationally expensive under large dictionary sizes; specifically, 
for every dictionary sample, we need to solve an optimization problem. We show that, however, the 
optimization problem in robust MUSIC can be converted to a single-variable optimization problem, 
thereby being solved with a very low computational cost. Simulations and real data experiment 
are used to show the effectiveness of the proposed algorithm. 

Related Works: While the topic of CS and sparse regression has received enormous attention 
in various fields, there are comparatively fewer works that study sparse regression in the presence 
of dictionary mismatches. Those works usually appear in signal processing, and the application 
is not HU. In |19j . perturbations of dictionaries were modeled as Gaussian noise, and an £i-norm 
regularized total least squares criterion was proposed; there, the focus was the single-measurement 
vector case (or the single-pixel case in our problem), and constraints on the unknowns were not 
considered. In [201121] . dictionary perturbations were modeled as scaling factors on each dictionary 
atom, and the formulated problem is convex. The algorithm in m attacked the dictionary mis¬ 
match problem in CSR-based direction-of-arrival finding. There, the mismatch was characterized 
by a subspace of a structured matrix, and the optimization surrogates there are also convex £ 2/^1 
norm and its smoothed counterparts. We also note that ip quasi-norm based sparse regression 
was applied to HU for single pixel-based unmixing without considering dictionary mismatches [23] . 
Here, our focus is collaborative sparse regression using multiple pixels, which is known to have both 
theoretical and practical advantages over the single pixel-based algorithms; we adopt the nonconvex 
£ 2 !ip quasi-norm, where 0 < p < I, as our sparsity-promoting function, since it has proven to show 
better performance in various applications; and we model spectral mismatches as deterministic 
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bounded errors, which does not require statistical assumptions and may be more flexible. 

Notation: The notations x G M” and X G mean that x and X are a real-valued n-dimensional 

vector and a real-valued m x n matrix, respectively (resp.). The notation x > 0 (resp. X > 0) 
means that x (resp. X) is element-wise non-negative. The ith column of a matrix X G is 

denoted by Xi G M™', and the jth row of X is denoted by x^ . The superscript “T” and “—1” stand 
for the transpose and inverse operations, resp. The orthogonal projector of the range space of X 
is denoted by Px = X{X"’"X)^X"^ , where the superscript “f” stands for the pseudo-inverse; and 
the corresponding orthogonal complement projector is denoted by = I — Px- The £p norm of 
a vector x G M"", p > 1, is denoted by ||a;||p = (X]r=i The £p quasi-norm, 0 < p < 1, is 

denoted by the same notation as above. The mixed ^p/^^-norm or Ip/ iq-qyiasi-norm. is denoted by 
ll^llg.p = 11**11?)^^^- The Frobenious norm is denoted by ||X||i? = ||X|| 2 , 2 - 


2 Background 

2.1 Signal Model and Dictionary-Aided Semiblind HU 

Consider a remotely sensed scene that is composed of mixtures of N different materials. Assuming 
linear mixtures, the measured hyperspectral image can be modeled as 

N 

y[£] = '^anSn[£]+v[£], £ = 1,...,L, (1) 

n=l 


where y[£] G denotes the hyperspectral measurement at the ^th pixel of the image, with M 
being the number of spectral bands; each a„ G M^, n = 1,..., A, represents the spectral signature 
of a particular material, indexed by n here; Sn[£] > 0 is the abundance of material n at pixel £; 
v[£\ G is a noise vector; and L is the number of pixels. For convenience, we will write ([T]) in a 
matrix form 

Y = AS + V, (2) 

where Y = [y[l],... ,y[L]], A = [ai,... ,aw], S = [s[l],... , s[L]], s[£] = [si[£],... ,SAr[^]]^, and 
U = [u[l],...,u[L]]. 

In HU, we aim by identifying A and S from Y. This amounts to a blind separation prob¬ 
lem where hyperspectral signal-specific properties—such as pure pixel and sum-to-one abundance 
conditions—are often utilized to attack the problem in many existing and concurrent HU studies. 
Dictionary-aided semiblind HU takes a different strategy. Motivated by the fact that many spectral 
libraries (e.g., the U.S.G.S. library 0) have been built up in the past decades, its principle is to 
use one such spectral library as a dictionary to infer what are the underlying spectral signatures, 
and hence materials, in the scene. To put this into context, define 


D = [di,...,dx] GM^^^ 


as a spectral dictionary, where each dk G is a previously recorded spectral sample for a specific 
material, and K denotes the dictionary size or the number of spectral samples. A dictionary often 
contains a wide variety of samples of materials, and as such K is large. The key assumption with 
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dictionary-aided semiblind HU is that the dictionary covers the spectral signatures of all materials 
in the scene; that is to say, 


an S {di,..., dx}, for every n = 1,..., N. 

Alternatively, we can write, for each n = 1,..., iV, 

an = for some S {1,..., K}. (3) 

Consequently, the signal model in ([2]) can be written as 

Y = DC+ V, (4) 

where C € is a row-sparse matrix; to be specific, the fc^th row of C, n = 1,..., A^, is the kth. 
row of S, and the other rows of C are all zeros. 

Let us consider the sparse regression approach—currently the main approach for dictionary- 
aided semiblind HU. The idea is to exploit the sparsity of C, thereby attempting to recover the 
indices ki,... ,kn correctly and the abundance matrix S accurately. There is more than one way 
to formulate such a sparse promoting problem (see, e.g., and the references therein), and here 
we are interested in the CSR formulation The CSR formulation is given as follows: 

min ||Y-i:>C'||| + A||C|| 2 i 

s.t. C > 0, 

for some prespecified constant A > 0. Here, notice that 110112,1 = l|c*|| 2 ) which aims at 

promoting row sparsity of C. As can be seen in Problem ([5]), CSR seeks to hnd a nonnegative 
row-sparse C that provides a good approximation to = DC. Problem ([5]) is convex, and a fast 
algorithm based on ADMM has been derived for Problem ([5]) [7]. 


2.2 Dictionary Pruning using the Subspace Approach 

As discussed in the Introduction, large dictionary size and high mutual coherence with the dictio¬ 
nary are two main difficulties encountered in CSR and other sparse regression methods, and these 
two difficulties may be circumvented by applying dictionary pruning. Here, we are interested in a 
subspace-based dictionary pruning method called MUSIC [8]. This subspace method may be best 
described by studying the noiseless case Y = AS. Let Us G denote a matrix that contains 

the first N left singular vectors of P. It can be shown that in the noiseless case and under some 
mild assumption^, we have 

Pus^k = 0 <;=^ dk = ak„ for some n G {1,... , A^}. (6) 

The physical meaning of ([6|) is that if a spectral sample df^ in the dictionary is also one of the 
spectral signatures in the scene, then it must be perpendicular to the orthogonal complement 
signal subspace. Also, the converse is true. From an algorithm viewpoint, the above observation 

^Specifically, we require that S has full row rank, and that spark(Z?) > -|- 1, where spark(X) = r means that 

any r columns of X are linearly independent. Intuitively, these requirements mean that the abundance maps of 
the different materials are sufficiently different, and that any N spectral samples in the dictionary are sufficiently 
different. 
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suggests that we can correctly identify the indices ki,... ,ki\f by the simple closed-form equations 
at the left-hand side (LHS) of (j6|)— at least in the noiseless case. 

In practice, where noise is present, the LHS of ([6|) may not be exactly all-zero. Under such 
circumstances, the following procedure can be used to estimate ki,..., k^: 

1. For k = 1,..., K, calculate 

7MUSIC(^) = II , 1^2 • 

11 \ \ 2 

2. Determine A = \ki ,..., such that for re = 1,..., iV, we have 7MUSic(^n) < 7MUSic(j) for 
all j i A. 

The above procedure is known as MUSIC mM- Also, note that we may use some other hyperspec- 
tral subspace identification algorithms, e.g., HySiMe [21], to estimate the signal subspace matrix 
Us from the noisy Y. MUSIC can in principle be used to perform dictionary-based semiblind HU. 
However, because of its sensitivity to colored noise and modeling error that are usually present in 
real data, it is used as a preprocessing algorithm for CSR (or other sparse regression methods) in 
practice. Specifically, MUSIC is used to discard a large number of spectral samples that yield large 
residuals 7 music(^)- The remaining spectral samples then form a (much) smaller dictionary for 
CSR to operate. Such a dictionary pruning procedure has been found to be able to improve the 
HU performance and speed up the process quite significantly—see [8] for the detail. 

3 Proposed Approach 

The crucial assumption with dictionary-aided semiblind HU is that there is no spectrum mis¬ 
matches; that is, we can always find a dictionary sample that exactly matches an actual spectral 
signature in the scene; cf. Eq. ([3|). As discussed in the Introduction, this may be not the case in 
reality. In this section, we will propose a dictionary-aided semiblind HU that takes into account 
the presence of spectrum mismatches. 

3.1 Dictionary-Adjusted Nonconvex Sparsity-Encouraging Regression (DANSER) 

We assume the following spectrum mismatch model in place of Q: 

dk^ = an + en, n = l,...,N, (8) 

for some Sn £ that characterizes the mismatch between the presumed and actual spectra of 
each material. Particularly, every spectral error is assumed to be bounded: 

||en||2<5, re = l,...,A, 

for some (5 > 0. Physically, our model assumes that the dictionary still covers all the actual spectral 
signatures in the scene, but their “best matched” spectral samples in the dictionary are subject to 
certain perturbations. Also, such perturbations do not go worse than <5^ in terms of magnitude. 

Our rationale is to adjust the dictionary in the CSR formulation. Specifically, we write 

dj^ dp; e/j, k 1,..., K, 
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where each G is a dictionary correction variable and we assume ||efc ||2 < 5. Following the 
CSR formulation in ([5]), we propose a new formulation as follows: 


min h\Y -D'CfF + X\\C\\% 

s.t. ||dfc -dkh <e, k = l,...,K, 
C>0, 


(9) 


where 0<p<l,A>0 and e > 0 are prespecified, and note that HCH^p = 11^*112- Comparing 

the original CSR formulation in ([5]) and the above formulation, we see two differences. First, 
Problem ([9]) adjusts the dictionary to attempt to neutralize the spectrum mismatches. Second, 
Problem Q employs a nonconvex row-sparsity promoting function lie'll^p. The reason is that 
nonconvex Ip quasi-norms may exhibit better sparsity promoting performance than the ^i-norm, 
as reported in the sparse optimization context [TUl[T51[25] . and we endeavor to explore such an 
opportunity to improve sparse regression performance in the HU application. The formulation 
in Q or its variants will be called dictionary-adjusted nonconvex sparsity-encouraging regression 
(DANSER) in the sequel. 


3.2 An Efficient Algorithm for DANSER 

Having expressed the DANSER formulation in the last subsection, we turn our attention to al¬ 
gorithm design for DANSER. A simple approach to handle DANSER is to apply alternating op¬ 
timization: fix D' and optimize Problem ([9]) with respect to (w.r.t.) C at one time, fix C and 
optimize Problem ([9|) w.r.t. D' at another time, and repeat the above cycle until some stopping 
criterion holds. While this approach is doable, our algorithm design experience is that it can lead 
to a computationally expensive algorithm. For instance, the optimization of Problem Q w.r.t. D' 
involves joint adjustment of all the dictionary samples in an inseparable manner, which is compu¬ 
tationally involved for large dictionary sizes. Also, the nonconvex row-sparsity promoting function 
11(7112 p used in Problem Q introduces difficulties in the optimization of Problem ([9]) w.r.t. C. 

In view of the aforementioned issues, we formulate a modified version of Problem (j9]) : 

min -\\Y-HC\\l + -\\H-D'\\l 

++'^) ( 10 ) 

k=l 

s.t. \\d!f, - dkh < e, k = l,...,K, 

(7 > 0, 

where /i, r > 0, and iT is a slack variable. In particular, it can be verihed that if /r = -|-oo and 
T = 0, then Problem m and Problem Q are essentially the same. It should be noted that we 
have applied the variable splitting technique in Problem (IlOp (specihcally, to the variable (7), which 
is a commonly used trick in contexts such as image reconstruction |26It28j . 

The modified DANSER formulation in (1101) can be handled in a low per-iteration complexity 
fashion. To describe it, let us hrst consider the following lemma [291131 j : 
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Lemma 1 Let 4>p{w) = 


- tLP f 2,„V-2 


w \ ’’ +TW, where 0 < p < 2, r > 0. The function (j)p{w) is strictly 


convex on w >t). Also, 4>p{w) satisfies the following identity 

ix^ + = min w ■ + finiw) 

and the solution to the problem above is uniquely given by 

Wopt = + t) ^ . 

By Lemma [U Problem (1101) can be equivalently expressed as 

min -\\Y-HC\\l + -\\H-D'\\l 
H,c,D',{wk} 2" 2" 


( 11 ) 


K 


k=l 



k 

\Wk 

C 


+ (l)p{Wk) 


( 12 ) 


s.t. ||dfc - dfc ||2 < e, A: = 1,...,IP, 

C > 0, 

> 0, A: = 1,... ,ii:, 

Now, our strategy is to perform alternating optimization w.r.t. H, D', {wk}, c^,..., . As we 

will see soon, the merit of doing so is that every update admits a computationally light solution. 
First, we examine the optimization w.r.t. H. One can easily see that the solution is 

H := {pD' + YC^) {CC^ + pi) . (13) 

Second, the optimization w.r.t. D' is separable w.r.t. d\,... i-e., for A: = 1,... , it', we have 


min \\d'^-hk\\l 
s.t. \\d!k - dkh < e. 

Problem (|14p is a projection problem and the solution is 


(14) 


di := 


dfc + e 


\\hk-dk\\2 ■ 


\\hk — dk\\2 < e 

otherwise 


(15) 


Third, to check the solution w.r.t. c^, let us hrst re-write the optimization w.r.t. C as 

min Y — HC\? 

C F 

S.t. C > 0, 


where 


Y = 


\[^Y 
V 2 

, H = 

' \/Ih' 

0 


_Diag(0)_ 















and 6 := [ ^/wlX ,..., yJwxX ]^. Then, the subproblem w.r.t. can be expressed as 


mm 


Yk - h.d' 


(16) 


s.t. > 0, 


where 


Yu = 




1 - 

1 _ 

0 


VwkXfk 


where fu is the feth column of the K x K identity matrix. Problem (|16p is known to have a simple 
solution [MIES], given by 


{cY : = 


[Y^H]:,k - C^iH^Hlk + {c’^f[H^H]k,k 


\HTH] 


(17) 


k,k 


where (x)+ = max{0,x}. Notice that using the update (I17p is desirable: the large matrix products 
Y^H and H both only need to be calculated once before updating Finally, by 

Lemma [H the solution w.r.t. {wk} is 


Wk = {p/2){\\c'^\\1 + t)^ k = l,...,K. 


(18) 


The alternating optimization process described above is summarized in Algorithm (H and we 
simply call it DANSER. The DANSER algorithm has the following solution convergence guarantee. 


Proposition 1 Every limit point of the solution sequence produced by DANSER (Algorithmic is 
a stationary point of Problem m- 

The proof of the above proposition is relegated to Appendix 0 Proposition [1] indicates that, 
although we have been dealing with Problem (llOp indirectly, a stationary point of Problem (llOp may 
be expected. Eollowing Proposition (H we can stop DANSER by checking the relative or absolute 
change of the solution C. Notice that since Problem (1101) is nonconvex, a good initialization would 
help DANSER converge to a better solution. In practice, one can use the CSR solution mentioned 
in Section II.A to initialize DANSER. 

Remark 1 By analyzing the per-iteration complexity of DANSER, one can verify that the com¬ 
plexities of many operations scale with K (i.e., the size of the dictionary) or higher. Eor example, to 
solve (fT^ . the operations CC^ and Y cost 0{K‘^L) and 0{MKL) flops, respectively; and the 
matrix inversion requires 0{K^) flops. Plus, although solving the problems w.r.t. is easy, these 
procedures have to be repeated K times at each iteration. Practically, it is therefore motivated to 
use a dictionary with a smaller size, or, to prune the dictionary in advance. However, due to the 
existence of spectral signature mismatches, directly applying MUSIC as in [8] for this purpose is 
not appropriate any more. To address this problem, a robust dictionary pruning method will be 
proposed in the next subsection. 
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Algorithm 1: DANSER 


input : (A, r,p,/i, e); D; C (initialization); Y. 

1 Wk = (p/2)(||c^||| + for /c = 1,..., A. 

2 repeat 

Unmixing: construct Q := [ \/w \\,..., y/wkX 


4 

5 

6 


7 

8 
9 


10 

11 

12 


13 

14 




\[Ih 

V 2 

0 

, H = 

V 2 

_Diag(0)_ 


let F := Y^H and G := H^H; 
for k = 1 : K do 
update by 

+ {c'^fGk,k 

Gk,k 

end 

Dictionary Adjusting: update H by 



H := (pD' + YC'^) {CC^ + ^lI) 


Update Slack Variable: 
for k = 1 : K do 


di := 



^k. 


\\hk-dk\\2 ’ 


end 

Reweighting: update {wk} by 


ll^fc — dk\\2 < e 

o.w. 


Wk = {p/2){\\c% + T)^P-^y^, k = l,...,K. 

15 until some stopping criterion is satisfied] 

output: C. 


3.3 Robust MUSIC for Dictionary Pruning 

Consider the MUSIC procedure back in Section II.B. In particular, recall that the metric 

dfiPfif dk 

7MUSIc(fc) = — II , n2 (19) 

IIII2 

should yield a small value when dk exactly matches an actual spectral signature in the scene, and 
this property has been used as the way to prune the dictionary in the MUSIC procedure. Now, in 
the presence of dictionary mismatches, we propose to replace (I19p by the following robust MUSIC 
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(RMUSIC) metric 


7RMUSlc(^) 


min 


\\dk-^M 


s.t. 11^112 < e, 


(20a) 

(20b) 


where e > 0 is prespecified. The idea is the same as the DANSER development in the above 
subsections—adjust the dictionary to find a better match, this time in a subspace sense. 

The key issue with realizing RMUSIC lies in solving Problem (1201) . Problem (|20l) is a single¬ 
ratio fractional quadratic program, which is quasi-convex and can be solved, e.g., by the Dinkelbach 
algorithm or its variants [341135] . While this means that we can implement RMUSIC by applying 
some existing optimization algorithms, we have to solve K such quasi-convex problems—which is 
still inefficient for large K. However, by carefully examining the problem structure, we find that 
this particular problem can be solved quite easily. To see this, let us re-express 7 rmusic(^) as 


7RMUSIC(^) 


min 

II4I|2<€ 


min 


Pds^d,-^) ^ 

\\Pds^dtk-ml + \\Pus{dk-ml 

vm 

vm + p 


where Pus = UsUg denotes the orthogonal projector of Us, and 


( 21 ) 


Vk{^) 


PUs{o-k - ^)||2 


( 22 ) 


Since the objective function of (|2ip is a monotone increasing function of r/^(^) £ [0,oo), computing 
7RMUSic(^) is the same as finding the minimal value of 7 fc(^) subject to ||^||2 < e. Let us denote 


Til. = mm 

IISIl2<e 


Pusi<^k-^) ^ 
Pus{0-k - ^)||2 


(23) 


We show that 


Proposition 2 The optimal value of Problem (1231) can be found by solving a single-variable problem 


hk = min 


\Pusdk}\2 ~ 


0<d<e ||Pj7s<^fc||2 -I- Ve^ — 


(24) 


The proof of Proposition[2|is relegated to AppendixjBl The message revealed here is quite intriguing— 
the originally quasi-convex problem can be recast into a simple single-variable problem that can be 
easily solved, e.g., by grid search or bisection. Practically, this means that the RMUSIC strategy 
can be implemented quite efficiently. 

As in the previous MUSIC work [8], we use RMUSIC to perform dictionary pruning for 
DANSER. Specifically, we use RMUSIC to select a number of K {K < K) spectral samples from 
D, form a size-iL dictionary, denoted by D here, and then use D as a pruned dictionary to run 
DANSER. We summarize this procedure in Algorithm [21 and we call the procedure RMUSIC- 
DANSER. 
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Algorithm 2: RMUSIC-DANSER 


1 

2 

3 

4 

5 

6 

7 

8 


input : Y; D] e; K. 

apply HySiMe [21] on Y to obtain Us] 
for A: = 1 : RT do 


tjI = mino<6)<e 


end 


^dk\\2—6\ 


7RMUSIc(fc) = 


(7 


11 A/g d*; 11 2 + ^ ’ 

"T+T’ 


determine A = {ki,... , such that 7rmusic(*) < 7RMUSic(i) for any i e A and j ^ A; 


determine D = 




dr. 


feed D and to DANSER (Algorithm [T]) ; 
output: C. 


4 Computer Simulations 


In this section, we use synthetic hyperspectral images to show the effectiveness of the proposed 
approach. In our simulations, the ground-truth spectra are randomly selected from a subset of 
the U.S.G.S. library that has 332 spectral signatures. The abundances are generated following the 
uniform Dirichlet distribution. Throughout this section, we set the number of pixels to be L = 5000. 
The ‘available dictionary’, D, is formed by the same subset of spectra, but a perturbation (i.e., 
for /c = 1, ..., K) is intentionally added to each spectrum. To quantify the ‘mismatch level’ of the 
available dictionary, we define the dictionary to modeling error ratio (DMER) as follows: 

DMER(dB) = lOlogio (||dfc*||i/<52) , 


where k* = argminfc=i^,,,^x l|dfc ||2 and 5 = maxfc=i^,,,^x ll^fclb- The perturbation term e^. follows 
the zero-mean i.i.d. Gaussian distribution and is scaled to satisfy DMER specifications. We also 
define the signal-to-noise ratio (SNR) by SNR = to quantify the noise level, where 

denotes the variance of the additive noise, which is also assumed to be zero-mean i.i.d. Gaussian. 
The choice of the parameter e is as follows 


e 


1 — a 
1 -j- cr 


du 


where a £ [0,1] is given. The parameter a controls the correlation between the RMUSIC/DANSER- 
resulted dictionary member — $ and the original one. Specifically, under ||^||2 < e, it can be 
shown that the above choice of e leads to — “• 

Eigs. [T1I2] show an illustrative example. Eig. [irshows the residues of applying MUSIC and 
RMUSIC to prune the dictionary D. Here, we randomly pick N = 6 spectra as the ground-truth 
materials, and then use the described dictionary D to observe the performance of MUSIC and 
RMUSIC. The parameter of RMUSIC is set to be a = 0.85, and we set DMER = 20dB and 
SNR = 35dB in this case. We see that MUSIC has difficulty in distinguishing several ground- 
truth spectra from the other dictionary members (to be precise, the third and the fourth materials’ 
spectra), but RMUSIC can clearly differentiate the same spectra from the irrelevant spectra. Eig. [2] 
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Figure 1: The projection residues of MUSIC and RMUSIC. 


compares the unmixing performance of DANSER and CSR using the same case, where the pruned 
dictionary with 40 spectra is obtained by RMUSIC. Here, the CSR part is performed by the 
CLSUnSAL algorithm [7], which is considered as a state-of-the-art. For DANSER, we set p = 0.5, 
A = 0.04 for this case. For CSR, the regularization parameter is A = 0.005. In this example 
and the forthcoming simulations and real experiment, we feed the solution of CSR to DANSER as 
initialization. We see that RMUSIC-DANSER yields much row-sparser C than that of RMUSIC- 
CSR, and all of the desired spectra have been successively identified by DANSER. 

In the following, we use Monte Carlo simulations to evaluate the performance of the proposed 
algorithms. Two performance discriminators will be used throughout this section. First, to measure 
the dictionary pruning performance, we dehne the following detection probability 

Pr IA C a| 

where A = {ki,... ,k]s[} denotes the index set that indicates the ground-true spectra, and A C 
{1,... , AT} denotes an index selection subset outputted by a dictionary pruning algorithm. Also, 
we will use K to denote the size of the pruned dictionary. Second, to measure the unmixing 
performance, we calculate the following the signal to reconstruction error (SRE) [5HZ]: 

SRE(dB) = lOlogio 

where C is the true row-sparse abundance matrix (see ([2|) in Section II.A), and C is the output of 
an unmixing algorithm. 

In Fig. [3l we show the index set detection probabilities of MUSIC and RMUSIC under various 
DMERs. In each trial, N = 8 materials are randomly picked. The SNR in this simulation is set 
to be 35dB, and AT = 40 is employed. The results are averaged from 1000 trials. One can see that 
MUSIC is sensitive to dictionary mismatches even under high DMERs, and MUSIC is not able to 
identify all the true materials from the dictionary. Generally, using RMUSIC with a = 0.85 and 0.95 
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Figure 2: The 2-norms of c^’s of CSR and DANSER. The black dash lines correspond to the indices 
of the ground-truth materials’ spectra in the dictionary; for the RMUSIC-pruned spectra, we set 
llc^lb = 0. 


both yield much better detection probabilities than MUSIC under all DMERs. Interestingly, one 
can see that RMUSIC with a = 0.75 admits very good detection probabilities when DMER< 20dB; 
however, when the DMER is higher, using a small a leads to a slight performance degradation. The 
reason is that a smaller ol implies that one is allowed to adjust d^’s more significantly in RMUSIC. 
Hence, several similar d^’s may be confused with each other. This observation suggests that a more 
conservative choice of a should be safer for implementing RMUSIC in practice. 

Eig. [4] shows the detection probabilities of RMUSIC and MUSIC under different K’s (the size 
of the pruned dictionary). Setting K to be small may be easier for the sparse regression stage, but 
is considered more aggressive—some spectra corresponding to the ground-truth materials may also 
be discarded. We see that when DMER> 15dB, RMUSIC with K = 20 yields higher detection 
probabilities than that of MUSIC with K = 60, and that RMUSIC with a larger K has a better 
detection performance. 

Eig. [5] and Eig. [6]show the performance of RMUSIC under various SNRs and underlying ground- 
truth materials, respectively. Erom these figures, one can see how this algorithm is scaled by 
different parameters. 

Beginning from Eig. [71 we show the SRE performance of the CSR-based HU algorithms. Specif¬ 
ically, we compare the SREs yielded by the proposed RMUSIC-DANSER and by MUSIC-CSR [8]. 
We also benchmark our algorithm using RMUSIC-CSR for fairness, since we now have seen that 
RMUSIC yields much better dictionary pruning performance. In all the following simulations, we 
hx p = 0.5, p = 10^, T = 10“® for DANSER, no matter how the simulation settings change; the 
sparsity-controlling parameter A for DANSER and CSR are also fixed to be 0.5 and 0.1 except 
specihed. We stop DANSER if < 10“^, where denotes the solution at iter¬ 

ation z, or if the number of iterations reaches 5000. The results in all the following figures of this 
section are averaged from 50 independent trials. 

Eig. |7]shows the SREs of the algorithms under different DMERs. We see that under all DMERs, 
RMUSIC-DANSER yields the highest SREs. We see that RMUSIC-CSR also consistently yields 


14 






















Figure 3: The detection probabilities of RMUSIC/MUSIC under various DMERs and different a’s. 
SNR= 35dB; N = 8] the pruned dictionary size is K = 40; the original dictionary size K = 332. 



Figure 4: The detection probabilities of RMUSIC/MUSIC under various DMERs and different i^’s 
(size of the pruned dictionary), a = 0.85; N = 8] the original dictionary size K = 332; SNR= 35dB. 
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Figure 5: The detection probabilities of RMUSIC under various DMERs and different SNRs. a 
0.85; N = 8; the pruned dictionary size is K = 40; the original dictionary size K = 332. 



Figure 6: The detection probabilities of RMUSIC under various DMERs and different N^s. a 
0.85; N = 8] the pruned dictionary size is RT = 60; the original dictionary size K = 332; SNR 
35dB. 
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Figure 7: The SREs of the algorithms under different DMERs. a = 0.85; N = 8] the pruned 
dictionary size is = 40; the original dictionary size K = 332; SNR= 35dB. 


better SRE performance than that of MUSIC-CSR—this suggests that RMUSIC itself can help 
improve the sparse unmixing performance. The runtime performance of DANSER and CSR (i.e., 
CLSUnSAL) is shown in Table [T] as a reference. We see that DANSER requires more time to 
converge compared to CSR, since it also adjusts the dictionary during its updates. Also, when the 
DMER gets higher, the convergence speed of DANSER improves by 1/3. This intuitively suggests 
that DANSER does put much effort on adjusting the dictionary (i.e., updating H) when the DMER 
is low. 


Table 1: The runtimes (sec.) of DANSER and CSR under various DMERs. a = 0.85; N = 8] the 
pruned dictionary size is = 40; the original dictionary size K = 332; SNR= 35dB. 


Algorithm 

DMER (dB) 

15 

20 

25 

30 

35 

40 

DANSER 

15.9205 

16.2639 

13.0023 

11.0784 

10.7352 

9.9090 

CSR 

0.8687 

0.9123 

1.3611 

1.3472 

1.6298 

1.4699 


Eig. [8]and Eig. Oshow the performance of the algorithms under different number of materials and 
SNRs, respectively. We see that the results are similar to that in Eig. [7] — the SRE performance of 
RMUSIC-DANSER is consistently higher than the other two under comparison. Notice that for the 
SNR= 25dB case, we change A of DANSER and CSR to be 1 and 0.5, respectively, to accommodate 
the situation where the data is more severely corrupted. 

Eig.llOlshows the SREs of the algorithms under different values of K. An interesting observation 
is that using K = 20 yields much better unmixing performance than using K = 00. This results 
may shed some light on choosing K in practice - using a large K may safely capture all the true 
materials in the pruned dictionary, but it may also degrade the unmixing performance since the 
sparse regression-type algorithms are in general in favor of smaller K. 
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Figure 8: The SREs of the algorithms under different N^s. a = 0.85; N = 8; the pruned dictionary 
size is K = 40; the original dictionary size K = 332; SNR= 35dB. 


SNR=25dB SNR=45dB 



Figure 9; The SREs of the algorithms under different SNRs. a = 0.85; N = 8] the pruned 
dictionary size is if = 40; the original dictionary size K = 332. 
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pruned diet, size =20 pruned diet, size =60 



Figure 10: The SREs of the algorithms under different Ks. a = 0.85; N = 8; the original dictionary 
size K = 332; SNR= 35dB. 


Band 30 (wavelength X = 647.7 nm) 



Figure 11: Band 30 (wavelength A = 647.7 nm) of the subimage of AVIRIS Cuprite Nevada data 
set that is used in the experiment of this section. 

5 Real Data Experiment 

In this section, we test the algorithms on the famous AVIRIS Cuprite data set which was captured 
in Nevada, 1997 (see http://aviris.jpl.nasa.gov/html/aviris.freedata.html). This data 
set has been studied for years and the abundance maps of several prominent materials are well 
recognized. The scene originally has 224 spectral bands between 0.4 and 2.5 m, with nominal 
spectral resolution of 10 nm. Low SNR bands, i.e., bands 1"2, 105-115, 150-170, and 223-224, 
have been removed, resulting a total of 188 spectral bands. We take a subimage of the whole 
data set, which consists of 250x191 pixels; see Fig. [TT] for this subimage at spectral band 30. The 
dictionary that we use here is also the same as the one that has been used in the simulations. It 
has been noticed that there are calibration mismatches between the real image spectra of this scene 
and the spectra available in the U.S.G.S. library m- Hence, this dataset is suitable for verifying 
our proposed algorithm. 
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We first apply RMUSIC and MUSIC on this data set. We adopt the following way to qualita¬ 
tively evaluate the performance: From the previous studies in ISHZ], we know that the prominent ma¬ 
terials are Alunite, Buddingtonite, Chalcedony, and Mmontmorillonite. Fig. 1121 shows the residues 
obtained by applying MUSIC (top) and RMUSIC (buttom). For RMUSIC, we set a = 0.85. The 
red circles correspond to the library members associated with Alunite, Buddingtonite, Chalcedony, 
and Mmontmorillonite. We see that the residues given by RMUSIC can be clearly separated into 
two groups, and the group with smaller residues include the spectra of the materials that we wish 
to identify. We should emphasize that the situation in Fig. [12] (bottom) is desirable in practice: 
since the residues associated with the spectra are clearly divided into to two clusters, it is easy to 
decide which spectra should be kept in the pruned dictionary. In this experiment, we simply keep 
the spectra below the green line, which is drawn by visual inspection, and this results in a pruned 
dictionary with K = 79 spectra. Compared to the original size K = 332, RMUSIC successfully 
reduces the dictionary size by 75% while preserving the spectra associated with the prominent 
materials. 

We follow the method as in to compare the abundance map estimation results of RMUSIC- 
CSR and RMUSIC-DANSER. Specifically, we plot the classification maps yielded by the U.S.G.S. 
Tetracorder software [36] and the estimated abundance maps of Alunite, Buddingtonite, Chal¬ 
cedony, and Mmontmorillonite by RMUSIC-CSR and RMUSIC-DANSER in Eig. [13]- Eig. [16] As 
mentioned in the classification maps are based on the older version Cuprite data captured 

in 1995, while the hyperspectral image was captured in 1997, which means that the details of the 
new data may not be fully revealed by the classification maps - but it still makes a good reference 
for visual evaluation. We see that for Alunite, Buddingtonite, and Chalcedony, RMUSIC-CSR and 
RMUSIC-DANSER yield similar abundance maps. However, for Chalcedony and Mmontmoril¬ 
lonite, the abundances given by RMUSIC-DANSER generally have stronger intensities all over the 
area of interest. Also, by enumerating the nonzero rows of the solution, it is noticed that DANSER 
identifies 15 active spectra from the pruned dictionary that consists of 79 spectra, indicating that 
the number of materials in the subimage is 15. This result is close to that yielded by HySiMe |24j : 
HySiMe is reliable in estimating the number of materials, and its estimate of this scene is 16. At 
the same time, CSR selects 19 active spectra. This observation also verifies our claim that using 
ip quasi-norm yields sparser solution. 

6 Conclusion 

In this work, we have developed a dictionary-aided semiblind HU method that takes into account the 
spectral signature mismatch problem. We proposed a dictionary-adjusted CSR formulation with a 
nonconvex collaborative sparsity promoting regularizer. By a careful reformulation, an alternating 
optimization algorithm with simple per-iteration updates was proposed. A new dictionary pruning 
algorithm based on a spectral mismatch-robust MUSIC criterion was also proposed. Simulations 
and real-data experiments showed that the proposed algorithms are promising in improving the 
HU performance compared to the prior works. 
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Figure 12: The MUSIC (top) and RMUSIC (bottom) residues of the real data. 



Figure 13: The U.S.G.S. Tetracorder abundance map (left) and the estimated abundance map of 
the Alunite by RMUSIC-CSR and RMUSIC-DANSER, respectively. 



Figure 14: The U.S.G.S. Tetracorder abundance map (left) and the estimated abundance map of 
the Buddingtonite by RMUSIG-GSR and RMUSIC-DANSER, respectively. 
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Figure 15: The U.S.G.S. Tetracorder abundance map (left) and estimated abundance map of the 
Chalcedony by RMUSIC-CSR and RMUSIC-DANSER, respectively. 



Figure 16: The U.S.G.S. Tetracorder abundance map (left) and estimated abundance map of the 
Mmontmorillonite by RMUSIC-CSR and RMUSIC-DANSER, respectively. 


Appendix 

A Proof of Proposition [1] 

Eirst, we claim that any limit point of the solution sequence generated by the DANSER algorithm in 
Algorithm [T] is a stationary point of Problem (jl2p (but not Problem (jlOp at this moment). The claim 
is obtained by applying a general alternating optimization (AO) result in [371 Proposition 2.7.1], 
which says that every limit point of a solution sequence generated by an AO algorithm is a stationary 
point of its tackled problem if each partial optimization problem in AO is strictly convex and has a 
continuously differentiable objective function within the interior of its feasible set. In our case, one 
can see that the partial optimizations of Problem (1121) w.r.t. H, D', c^,... ,c^ and satisfy 

the above condition. 

Second, we claim that a stationary point of Problem (1121) is also a stationary point of Prob¬ 
lem (llOp . The proof is as follows. For notational convenience, let X = w = 

[rci,..., wk]'^, and denote 


g{X,w) 

fix) 


K 


h{X) + xY, 



k 

[Wk 

C 


+ 4>piwk) 


k=l 


K 

hix) + xY^ 

k=l 



p/2 
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as the objective functions of Problem (1121) and Problem (|10ll . resp., where 

h{X) = ^\\Y-HC\\l + !^\\H-D'\\l. 


Also, recall from the development in Section III.B that 


f{X) = ming(X,m). 

ii;>0 


Now, let {X*,w*) be a stationary point of Problem (I12p . which, by definition, satisfies 

(Vio 5 (X*, — w*) < 0, Vm > 0, (25a) 

Tr {{Vxg{X\w*)f{X - X*)) < 0, VX e A, (25b) 

where Vxg{X,w) and Vwg{X,w) denote the gradient of g{X,w) w.r.t. X and w, resp., and X 
denotes the feasible set of X in Problem (1121) or Problem (IIUI) . From (I25al) . we observe that 


g{X\w*)= min g{X\w), (26) 

t£;>0 

and the argument is as follows: g is strictly convex w.r.t. m > 0 by Lemma [U and as a result 
of the optimality conditions of convex optimization, (I25ap holds if and only if w* is the optimal 
solution to min^„>o g(X*, m). Eq. (|26P implies that f{X*) = g{X*,w*). Consequently, we can 
rewrite (I25bl) as 

Tr {{Vxf{X*)f{X - X*)) < 0, VX G X. 

The above equation is identical to the definition for X* to be a stationary point of Problem (1101) . 
Hence, we have proven that for any stationary point (X*,w*) of Problem (1121) . the part X* is a 
stationary point of Problem (HOI) . 

Finally, combining the above two claims leads to the conclusion in Proposition [TJ 


B Proof of Proposition [2] 


Recall that we aim at solving 


min 

Il^ll2<e 


where 

Pusidk - ^)||2 

By the triangle inequality, we have 



%(^) > 


Pugdk 

2 

Pus^ 

2 

\\Pusdk 

I2 + 1 


2 


(27) 


(28) 


(29) 


where equality holds if and only if i) = I^Pugf^kj /3 > 0, and ii) Pus^ — <^Pusdk, a > 0. The 

two conditions above can be satisfied simultaneously by setting 




a 


iPriad 


Usf^k\\2 


-PUgdk + 


13 


I ^Ug dk\\2 


Pus^k 


(30) 
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(31) 


for some a,/3 > 0. Also, note that ||^||2 < e is equivalent to 


+ < e^. 


Substituting pO)) into and by noting pT]) . we recast Problem p7|) as 


min 

a,0>O 

a^+/3^<e^ 


Pus^k 


\\Pusdk\ 

2 +a 


(32) 


Consider two cases, namely, i) 
is /3 = 




< e^, and ii) 


Pus^k 


> e^. For case i), the optimal (3 


Pus^k 


. For case ii), we 


, and the optimal a may take any value in ^ ^I2J 

observe the following: fixing /3, a should be made as large as possible so as to reduce the objective 
value. Hence, we can substitute a = — /3^ (the largest possible a fixing /3) into Problem 

and simplify the problem to 


= mm 


Pt^^d,ll2-/3 


O<0<e WP^j^dkh + 


(33) 


which is exactly Problem 
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